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Abstract 



This theoretical paper proposes a neuronal circuitry layout and synaptic plasticity 
principles that allow the (pyramidal) neuron to act as a combinatorial switch, whereby the 
neuron learns to be more prone to generate spikes given those combinations of firing input 
neurons for which a previous spiking of the neuron had been followed by a positive emo- 
tional response; the emotional response is mediated by certain modulatory hormones or 
neurotransmitters, e.g., the dopamine. More generally, a trial-and-error learning paradigm 
I is suggested in which the purpose of emotions is to trigger long-term enhancement or weak- 

CLf ening of a neuron's spiking response to the preceding synaptic input firing pattern. Thus, 

emotions provide a feedback pathway that informs neurons whether their spiking was ben- 
eficial or detrimental for a particular input combination. The neuron's ability to discern 
specific combinations of firing input neurons is achieved through a random or predeter- 
mined spatial distribution of input synapses on dendrites that creates synaptic clusters 
that represent various permutations of input neurons. The corresponding dendritic seg- 
ments, or the enclosed individual spines, are capable of being particularly excited, due to 
local sigmoidal thresholding involving voltage-gated channel conductances, if the segment's 
excitatory and absence of inhibitory inputs are temporally coincident. Such nonlinear ex- 
citation corresponds to a particular firing combination of input neurons, and it is posited 
that the excitation strength refiects the combinatorial memory and is regulated by long- 
term plasticity mechanisms. It is also suggested that the spine calcium influx that may 
result from the spatiotemporal synaptic input coincidence may cause the spine head actin 
filaments to undergo mechanical (muscle-like) contraction, with the ensuing cytoskeletal 
deformation transmitted to the axon initial segment where it may modulate the global neu- 
ron firing threshold. Relation between the emotion-modulated multi-neuron combinatorial 
switching and cognitive tasks such as the associative grouping and abstract deduction is 
discussed. 



1 Introduction 

The field of reinforcement learning (RL) solves the problem of sequential decision making by 
an agent receiving delayed numerical rewards [T]. The field can be viewed as originating from 
two major threads: the idea of learning by trial and error that started in the psychology of 
animal learning (e.g., [2]), and the problem of optimal control and its solution using value 
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functions and dynamic programming [3j. An important application of the RL theory is the 
temporal difference (TD) class models for the phasic activity of midbrain dopamine neurons 
[UEIE]. The dopamine activity is believed to encode a reward prediction error (RPE) signal 
that guides learning in the frontal cortex and the basal ganglia O El [9l [TO] . Most scholars 
active in dopamine studies believe that the dopamine signal adjusts synaptic strengths in a 
quantitative manner until the subject's estimate of the value of current and future events is 
accurately encoded in the frontal cortex and basal ganglia 

In contrast to the RL field, this paper considers the problem of instantaneous decision 
making by an agent receiving immediate rewards. A trial- and-error learning paradigm is 
suggested in which the reward signal modulates memory in (cortical) neurons that act as 
combinatorial switches. The reward signal may come from an "elementary" reward generator 
such as that reflecting pain or satisfaction of hunger; it may also involve an RPE-type signal 
mediated by dopamine and/or other agents that could convey positive as well as negative 
reward components as was first suggested in [12] . 

The first contributing thread to the presented model, as in the RL field, is the idea of 
learning by trial and error and reinforcement of favorable outcomes. The idea, as expressed in 
Edward Thorndike's "Law of Effect" is: "Of several responses made to the same situation 
those which are accompanied or closely followed by satisfaction to the animal will, other things 
being equal, be more firmly connected with the situation, so that, when it recurs, they will 
be more likely to recur; those which are accompanied or closely followed by discomfort to the 
animal will, other things being equal, have their connections to the situation weakened, so that, 
when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, 
the greater the strengthening or weakening of the bond." This idea is widely regarded as a 
basic principle underlying much behavior [13lll4[ [T5 ] ll6j. In this paper, the "satisfaction" and 
"discomfort" are collectively referred to as the "emotional response". 

The second contributing thread is a novel idea that, given proper neural circuitry layout, 
pyramidal neurons can process information by switching the neuron output based on active in- 
put neuron combinations. Additional computational advantages that could make this possible 
may be provided by mechanical force generated at the dendritic spines and stretch-activation 
of Na^" channels at the axon initial segment. An interesting feature of the presented framework 
is its ability to distil reusable abstract concepts about the environment, making learning with 
a low-dimensional feedback signal (the emotional response) efficient. 

1.1 Problem formulation 

The following organism-level learning problem is posed. For simplicity, the neuronal activity 
states are considered to be binary: "firing" or "not firing". Given an arbitrary combination S 
of firing neurons in a (perhaps sensory) neuronal layer Li, activate a corresponding "optimal" 
combination R*{S) of firing neurons in a (perhaps motor) neuronal layer L2 (Fig. 1(a)). The 
optimal combination R*{S) is defined as one that produces the motor behavior that results 
in a positive emotional response E in the organism. As such, R*{S) can be an arbitrary 
combination of firing L2 neurons from a combinatorics perspective. The emotional response 
E, in biological terms, is posited to be mediated by certain modulatory neurotransmitters or 
hormones that are diffusely delivered to a large number of generally trainable neurons. It is 
assumed that E can be activated by evolutionarily hardwired circuits, such as when hunger is 



2 



Input 



Pattern S in L 



Input 




Enhance S's excitation of R 



Weal^en S's excitation of R 



Yes 




Pattern R in L2 



"Action" or 
"guessing" 
mechanism 



Output 



Output 

I (a) 



No 



(b) 



Figure 1: The organism- level learning problem and an outline of the suggested solution, (a) 
Formulation of the problem. Neurons Sj, i = l,...,m in layer Li connect to neurons r/., 
k = 1, . . . ,n in layer L2. A pattern of excitations S = {sj}, if responded to by a pattern of 
excitations R = {r^}, elicits a positive or negative emotional response E resulting from the 
interaction of the generated motor behavior with the environment. The problem is: given an 
arbitrary S, excite R*{S) that would lead to positive E. (b) Outline of the suggested solution. 
Learning proceeds by trial and error. Excitation of pattern S excites a pattern R{S), possibly 
with the help of an "action" mechanism (e.g., depolarization to all L2 neurons until a certain 
level of the aggregate L2 output activity is achieved, as discussed in Sec. 2.3). A "guessing" 
mechanism introduces variations in the excited patterns R. S's excitation of those R that lead 
to positive (negative) E is enhanced (weakened). 



satisfied, as well as by higher mental processes, e.g., due to the organisms' subjective evaluation 
of the motor behavior as satisfactory given the sensory inputs. 

It is suggested that the learning process proceeds in a trial-and-error fashion. Given a 
firing combination 5 in Li, variations are introduced in the firing combination R in L2, with 
the S's excitation of those R that lead to positive E being enhanced while S's excitation of 
those R that lead to negative E being weakened (Fig. 1(b)). Details of this suggested process 
are discussed in more detail in Sec. 5. First, a more elementary learning task is considered: 
given an arbitrary combination S of firing neurons in a (perhaps sensory) neuronal layer Li, 
long-term strengthen activation of a neuron r^, specifically by S, in a (perhaps motor) neuronal 
layer L2, if the subsequent emotional response E is positive. Conversely, long-term weaken 
activation of r^, specifically by S, if E is negative (Fig. 2). 

2 Solution to the single-neuron combinatorial switching prob- 
lem 

2.1 Local dendritic integration as the bcisis for combinatorial memory 

The following mechanism is posited as the solution and is illustrated in Figs. 3 and 4. Li 
neurons connect to the dendrites at random or predetermined locations, forming spatially 
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Figure 2: The single-neuron learning problem. Li neurons Sj connect to an L2 neuron r^. 
Enhance (weaken) excitation by those combinations 5 for which the following excitation 
resulted in a positive (negative) E. 

localized (and possibly overlapping) "synapse neighborhoods" Nj that contain various per- 
mutations of input neurons (Fig. 3). Sufficient depolarization of the dendritic and/or spine 
interior within the jth neighborhood, caused by the temporal coincidence of the neighbor- 
hood's excitatory and absence of inhibitory inputs, produces the memory factor Mj that can 
meaningfully contribute to the excitation relative to the other drivers of neuronal stimu- 
lation. The Mj magnitude is modifiable by long-term potentiation (LTP)-related plasticity 
mechanisms that are regulated by, in addition to the known LTP-modulating factors such as 
back-propagating action potentials (BPAPs) [171 118j . the emotional response E received by 
the cell following the local Nj excitation. More specifically, it is suggested that the receipt of 
positive (negative) E at Nj , if it was closely preceded by a BPAP at Nj that itself immediately 
followed the local Nj excitation, results in a long-term enhancement (inhibition) of the Mj 
magnitude via the LTP (long-term depression, LTD) induction in the activated spines. That 
is, a positive (negative) E signals to induce LTP (LTD) in the spines that had collectively 
contributed to the local Nj excitation that itself contributed to the firing. 

Two possible physical realizations of the combinatorial memory (i.e., the memory of the r^'s 
response to a specific combination of firing input neurons) are considered next. The memory 
mechanism can be predominantly electrical, that is, related to the local electrical synaptic 
integration in dendrites |19^ 120], and/or mechanical. Regardless of the mechanisms, what is 
essential for the presented model is that 1) the memory factor Mj is specifically expressed 
given the spatiotemporal coincidence of inputs, conferring Mj the combinatorial nature, 2) 
compared to the other drivers of neuron stimulation, Mj can substantially contribute to the 
Tfc excitation and 3) long-term regulation of Mj magnitude is critically dependent on the type 
of E received by the cell immediately following the Mj expression. 
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Figure 3: Solution to the single-neuron learning problem shown in Fig. 2. Synapses from 
neurons form spatially localized clusters, or "neighborhoods", Nj on the dendrites. In the 
figure, a collection of SjS below an Nj denotes neurons projecting into the cluster. Activation of 
excitatory synapses simultaneous with a lack of activation of inhibitory synapses in a cluster Nj 
produces the memory factor Mj that can significantly contribute to the excitation. Factors 
Mj generated at all neighborhoods superimpose. Mj expression is regulated as suggested 
in Fig. 4. 

2.2 Possible physical realizations of the combinatorial memory 
2.2.1 Mechanical (muscle-like) mechanism 

An interesting Mj realization is possible if the free calcium entering a spine during a spatiotem- 
poral synaptic input coincidence event elicits spine actin filament contraction, e.g., through 
calcium-activated actin interaction with myosin or another actin-binding protein. The ensuing 
cytoskeletal and cytoplasmic stresses, the magnitudes of which could encode the combinato- 
rial memory, could be transmitted along the dendritic shaft to the r^'s axon initial segment 
(Fig. 5). At the initial segment these stresses, superimposed with those generated at other 
dendritic sites, could regulate the global excitation threshold via stretch-modulating Na"*" 
voltage-gated ion channels (Nav) |21l 122] . The use of mechanical force would provide a second 
dimension to the neuron's computational machinery, disentangling the spatiotemporal coinci- 
dence detection mechanism, which would be electrical and based on local nonlinear voltage 
summation, from the Mj readout mechanism, which would be mechanical. The spine head 
volume and the associated quantity of actin filaments would then refiect the Mj magnitude, 
rather similarly to how the muscle cell volume and strength reflects the memory of previous 
exercise. 

Several observations favor the mechanical model for pyramidal neurons. Morphologically, 
unlike many other neuron types, pyramidal neurons have rather straight dendrites that tend 
to branch at small angles. This should facilitate the transmission of the cytoskeletal and 
cytoplasmic deformation via the microtubule tug and cytoplasmic pressure spread, respectively. 
Interestingly, the dendritic shaft microtubules invade the spines, where they likely link to the 
actin cytoskeleton [231 124j. This structural linkage, together with the rigid linear structure of 



5 



Pattern S in L. 



Enhance M, expression 



Weaken M, expression 



Do not cliange IVI; expression 



Corresponding r^'s set N={Nj} is excited, 
memory factors {IVIj} expressed 




"Action" or 
"guessing" 
meclnanism 



Figure 4: Suggested learning rules for the memory factors Mj. The excitation of a neuronal 
firing pattern S in Li leads to the excitation of the corresponding set of neighborhoods Nj in 
Vk- A memory factor Mj is long-term enhanced (weakened) if the neighborhood Nj is excited 
by the firing of the corresponding combination of input neurons, this is closely followed by a 
BPAP at Nj, and the following E is positive (negative). 

the microtubules, may also facilitate the transmission of the cytoskeletal stresses originating 
in the spines. The number of spines involved can potentially range from as little as one to 
as many as hundreds if the whole thin terminal dendrite branches participate in the local 
postsynaptic integration |19i [25l [26l [27] . The resulting stretch activation at the axon initial 
segment may act in a digital manner, providing a feedback signal that a structural modification 
has occurred somewhere in the dendritic tree, much like a back-propagating action potential 
is thought to tell the neuron the axon has fired. This could in principle solve the problem of 
distance-dependent synaptic integration [28] . 

A pressure wave packet of 1 ms duration propagating in the cytoplasm of an unmyelinated 
axon of 1 //m diameter has the velocity of 1.1 m/s and decay length of 0.18 mm [22] . For 
a cytoplasmic pressure wave packet of 1 ms duration propagating in a dendrite of 1 
diameter the velocity and decay length should be similar, assuming that the specific mechanical 
properties for the unmyelinated axon and the dendrite are close. The pressure wave velocity 
and decay length should scale with the wave frequency oj and the dendrite diameter d as y/uJd 

and \/ ^, respectively |22J (see also [29]). Therefore, a 10 ms pressure pulse in a 1 ^m-diameter 
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Figure 5: The influx of Ca^^ into a spine could elicit (muscle-like) contraction in the spine's 
actin cytoskeleton. The ensuing cytoskeletal stresses are shown as green arrows, and the 
cytoplasmic pressure gradient as blue arrows. The mechanical deformations could propagate 
along the dendrite, as shown. At the axon initial segment, the deformations could modulate 
the global neuron firing threshold. 

dendrite should propagate at 0.35 m/s and with 0.57 mm decay length; a 100 ms pressure pulse 
should have 0.11 m/s velocity and 1.8 mm decay length. These values are consistent with the 
idea that mechanical forces can be transferred through the lengths of the pyramidal neuron 
dendrites and that the forces can be produced and transmitted sufficiently rapidly so as to be 
associated with the spike initiating event. 

Given these observations, it is suggested that the mechanical model for the combinatorial 
memory (i.e., the mechanical modulation of the global neuron firing threshold using the local 
voltage summation nonlinearity for the spatiotemporal synaptic input coincidence detection), 
may have evolved relatively recently, culminating in the pyramidal neuron in higher animals, 
to enable the neuron to more specifically respond to the combinatorial aspects of inputs. It 
is also suggested that the spine actin cytoskeleton contractility may be local to the spine or 
the neighborhood, where it may effect the readout of the combinatorial memory by stretch- 
modulation of gating of the membrane ion channels. Note that the mechanical memory readout 
model confers functional roles to the spine head volume and the high actin content of the spine, 
which are still enigmatic [281 l24l l30] . 

2.2.2 Electrical mechanism 

The memory factors Mj can also be interrelated with the sigmoidal thresholding of the locally 
summed postsynaptic potentials (PSPs) [191 [26] . The sigmoidal thresholding results from 
the activation of voltage-gated calcium channels (Cav) and/or N- methyl- D-aspartate (NMDA) 
receptor calcium conductances [19 [ I25 [ [26]. Here, the memory factors Mj can be somewhat 
loosely defined as the excess of the local supralinear input summation over an "untrained" 
summation (Fig. 6). 

It has been shown in vitro that the spine calcium infiux via NMDA receptors (NMDAR), 
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Number of coincident excitatory inputs 



Figure 6: Local electrical summation of PSPs. Black curve: local depolarization as a function 
of the number of coincident excitatory inputs for "untrained" summation. The nonlinear 
boosting is due to the activation of local voltage-gated conductances [HI [25l [26] . Blue: the 
boosting threshold is lowered and the summation plateau is raised as a result of the LTP- 
associated increase in the number of AMPA receptors in a neighborhood synapse. Red: the 
boosting threshold is little changed while the summation plateau is raised as a result of the 
LTP-associated increase in the number of Cav channels in a neighborhood synapse. 

when closely followed by postsynaptic spikes mimicking BPAPs, causes LTP induction that 
enlarges the spine [17J. Although these experiments used single-spine stimulation, it is likely 
that the NMDAR calcium influx caused by a spatiotemporal synaptic input coincidence should 
also elicit LTP and enlargement in the affected spines, when closely followed by a BPAP. 

The LTP and the associated spine enlargement could confer the combinatorial memory in 
two ways. First, the LTP-associated increase in synapse efficacy, via an increase in the number 
of a-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) glutamate receptors [28] . 
should lower the local boosting threshold and raise its summation plateau (Fig. 6). Second, 
the summation plateau should be raised as the peak Cav conductance is increased due to the 
greater numbers of Cav channels in the non-synaptic-cleft spine membrane given that the linear 
spine dimensions are enlarged. In the latter case, the sigmoidal summation threshold is not 
likely to be significantly altered, as the Cav conductance is activated only given the threshold 
Cav depolarization is reached. It should be noted that the spines in which the combinatorial 
memory Mj can be structurally realized via LTP should probably be located centrally in the 
neighborhood Nj, so that their boosted excitation is more likely to be indicative of a synaptic 
input coincidence in the neighborhood. 
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2.3 Neuronal circuitry layout 

Within the framework outhned above, the output of can be approximately described as 



where Xi represents the ith. neuron input, Wi its Hnear weight in the r^'s initial segment de- 
polarization, Vij the weight of Xi in the A'^- depolarization, FjO the excess of the "untrained" 
local sigmoidal voltage summation at the neighborhood Nj over the linear summation {Fj{) 
is negative in the sublinear regime), Mj^\) the "electrical" combinatorial memory factor con- 
tribution to the Nj depolarization, Wj the weight of Fj and AIj^^ in the r^'s initial segment 
depolarization, Mj"^\) the "mechanical" combinatorial memory factor generated at the neigh- 
borhood Nj and Wj the weight of Mj"^^ in the modulation of the initial segment ion channels. 
As a function of its first argument, r^O behaves similarly to a two-layer neural network with 
a global sigmoidal output nonlinearity [191 ESI [26] while the second argument modulates the 
global sigmoid shape and threshold position. In the following, M^^^ and Mj"^^ are collectively 
referred to as Mj. 

Fig. 7(a) shows suggested neuronal circuitry layout for the learning process and the 
combinatorial memory readout. It is assumed that in an untrained organism presented with 
an S" in Li, the classical postsynaptic integration coupled with the factors Mj does not suffice to 
excite r^. In a learning trial, is activated by additional depolarization created by increased 
excitation or reduced inhibition from one or more "guess" (G) or "action" (A) neurons that 
connect to in dominant positions, such as near the axon initial segment. Alternatively, the 
"guess" or "action" neurons could connect to at the apical tuft, where they could generate 
the Ca^"*" dendritic spikes propagating towards the soma and driving initiation of the action 
potentials [28] , It is suggested that each of the "motor" neurons L2 is structurally connected 
to Li in a similar, although not necessarily identical, manner, such as when closely spaced L2 
neurons sprawl basal dendrites in the same plane, into which Li axons diffusely and randomly 
project (Fig. 7(a)). This connectivity would be conducive to increasing the learning power of 
the system, as each L2 neuron would roughly be equal in its ability to learn how to react to 
an arbitrary combination S. In an untrained system, given an 5 in Li, all L2 neurons should 
be similarly close to the activation threshold as described by Eq. 1. Following learning, the 
L2 neurons trained to react positively to S should be closer to the activation threshold than 
others. The actual combinatorial memory readout could proceed using the "action" neurons 
that deliver similar rising levels of depolarization to all L2 neurons, e.g., via somatic or apical 
tuft connections, until a certain level of the aggregate L2 output activity is achieved. 




(1) 
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Figure 7: Suggested neuronal architecture for the learning process and the combinatorial 
memory readout, (a) Learning with two layers, Li (red) and L2 (black). Input from Li 
diffusely projects into L2 dendrites that are arranged in a layer. The "guess" and "action" 
inputs (green) connect at the soma or the apical tufts, (b) Learning with an added intermediate 
layer Li (blue). Layer Li diffusely projects into both L2 and Lj while Lj diffusely projects 
into L2. The "guess" and "action" neurons connect at the apical tufts. First, Li neurons learn 
to fire for the important to the organism combinations 5 in Li. The reduced dimensionality 
signals are then used in further L2 learning. In the neocortex, L2 could correspond to the 
layer V pyramidal neurons, while Lj to the layer II/III pyramidal neurons. 

3 Abstract deduction and associative grouping stemming from 
combinatorial switching 

Consider a situation in which Li contains m excitatory neurons, each of which drives excitation 
of a corresponding inhibitory neuron in layer L[, and both Li and L[ neurons innervate L2. 
Then, for an arbitrary Li combination sj^^ consisting of p neurons with q firing neurons and 

p — q silent neurons {m > p > g > 1), in principle, it is possible to construct a local synaptic 

(p) (p) 

neighborhood Nj that expresses the nonlinear memory factor Mj given and only is 
realized in the p neurons. In its simplest form, such Nj consists of q excitatory connections 
from the q firing neurons and p — q inhibitory connections from the L'^ neurons that are driven 
by the p — q silent Li neurons, with the neighborhood geometry and membrane conductances 
tuned such that the memory factor Mj is expressed only when all excitatory and no inhibitory 
synapses are active in A'^-. 
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More generally, it is evident that the neighborhoods Nj can be the sites of deduction and 
storage of information about combinatorial aspects of inputs. Suppose that the combinations 
. . . , 5*1^^ realized in a certain set ofp neurons in Li, if responded to by activation, 
elicit positive E while some other combinations s[j^-^, ... in the p neurons, if responded 

to by Tfc activation, elicit negative E. Further, suppose that the combinations ^\ ^\ 
... in the remaining m — p neurons in Li do not affect E, if responded to by activation. It 
is easy to show that, after a series of learning trials when is activated (e.g., by G) following 
the presentation of random sets of inputs (i.e., random S E S^f'^ x S^T ^^), those Mj that 

reflect the features of , , ■ ■ ■ , S^^^ are likely to be the most strengthened while those 

(p) (p) 

Mj that reflect the features of S^^-^, 'S'j+2) • • • likely to be the most weakened. 

Indeed, if all possible combinations S^f^ x S^J!^ are presented to the organism sequentially, 
each followed by the activation, from the suggested rules for Mj plasticity it follows that 

those Mj that reflect the features of s[^^ , S^^ , • • • , sj:^^ will be the most strengthened while 

(p) (p) 

those Mj that reflect the features of Sf_^i, 'S't+2' ■■•will be the most weakened. Therefore, 

if the random sampling of S" S S^f^ x S^JJ^ is unbiased, the above probabilistic statement 
regarding learning is true. 

In the above example, the strengthening of the factors Mj corresponding to, e.g., the 
pattern 5^^^ can be considered a form of abstract deduction, or deductive reasoning, about 

which elements of the inputs S^, x Sl„ need to be responded to by the activation in order 

(p) 

to achieve a positive E. Note that the pattern may not be realizable in Li without some 
context provided by S^T ■ Therefore, s'f^ and the corresponding {Mj} may represent an 

abstract concept that does not occur on its own in the input environment. Also, the training 

ip) ip) 

of rfc creates a form of associative grouping, or pattern classification. Indeed, S"]; , S2 i ■ ■ ■ , 
sj:^^ is a group of Li excitation patterns that become associated in their excitatory action on 
r^. It is interesting that the same cell r^. can learn how to respond to a variety of unrelated 
input combinations, as each neighborhood A'^- is structurally independent. 

The more specific MjS exist for a pattern the more specific the memory of can 
be. Lets assume that the r^'s clusters Nj consist of excitatory synapses only and that in 
order to express a cluster's memory factor Mj all the cluster's synapses have to be excited. 
The following question is posed: for Li consisting of p excitatory neurons, q typical firing 
pattern size {q < p), C synapses per cluster and random Li to connectivity, what is the 
frequency of occurrence of clusters that completely represent a particular Li excitation 
pattern? Approximately, the frequency of occurrence of clusters that consist of inputs all 
belonging to a particular pattern of size q is (p*^, while one needs at least ^ clusters to 
encode a pattern of size q, leading to u ^ )'^- Note that, as expected, for C = 1, i.e., 
non-clustered inputs, this leads to ^ = p synapses needed to represent an arbitrary pattern. 



For q = C, i.e., exact pattern encoding, one needs ^ = qi^)" synapses. 



Using the above formulas and assuming, for the sake of argument, p = 100 and q = 10, the 
number of a neuron's input synapses needed to fully represent an arbitrary pattern is lO*^"*"^. 
Comparing this to / = 50,000, the number of contacting synapses for a pyramidal neuron [28j . 
would suggest the cluster size of not more than 3 to 4. Note that for p = 100, q = 10 and 
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C = 3, roughly '^^'^^^ (p)*^ ~ ^^■'^ clusters are excited on average on a single output neuron 

for the typical Li firing pattern. For larger cluster sizes, one would need roughly ^( jgC-i )^ 
output neurons similarly connected to Li in order for at least one of them to have a full 
representation of an input firing pattern of size q. 

On the other hand, as discussed in Sec. 6, most of the learned activity of higher organisms 
could be considered a form of combinatorial switching if the combinatorial switching idea is 
taken to its logical extreme. Then, the language, as an artificial human construct designed 
for case of communication, should reflect the switching dynamics that the involved neurons 
and their neighborhoods are "comfortable" operating on. There are about 40-50 sounds in 
a typical language and 4-5 sounds in a typical word, which may suggest, roughly, 4-5 as the 
typical pattern size (g), and 40-50 as the number of input neurons (p). Here for simplicity 
the issue of sound ordering within words is ignored. Using the formulas above with q = 5 
and p = 50, the approximate number of contacting synapses per neuron needed to represent 
an arbitrary word is 5 • 10*^. Again, comparing this to the experimentally observed 50,000 
synapses per neuron suggests the cluster size of not more than 4. Note that one could also 
hypothesize that q (and p) can be related to the number of syllables comprising a typical word, 
the number of words in a typical sentence, and the number of the elements in the writing of 
a typical letter. 

4 Training of intermediate layers 

To further reduce the dimensionality of inputs, through encoding of frequently occurring and 
significant to the organism neuronal excitation patterns, an intermediate layer could be 
trained using the neuronal architecture suggested in Fig. 7(b) . The " sensory" layer Li axons 
randomly project into the basal (and possibly proximal apical) dendrites of both the inter- 
mediate layer Lj and the "motor" layer L2 while the Lj axons randomly project into the L2 
basal (and possibly proximal apical) dendrites. The apical tufts of both Lj and L2 receive the 
guessing and action driving signals emanating from distal brain areas. Note that the sprawl- 
ing planar arrangement of basal dendrites of Li and L2 neurons and a random distribution of 
contacting synapses should increase the learning power of the system, as discussed in Sec 2.3. 

First, some L2 neurons are trained to respond to certain Li combinations of firing neurons. 
As a side effect, some neurons learn to become more responsive to the frequently occurring 
Li patterns that are followed by positive E. This learning could occur even in the absence of 
BPAP in the Lj neurons if the LTP can be induced only by the local (e.g., NMDA-recruiting) 
postsynaptic boosting and the subsequent receipt of positive E. Then, the trained Lj neurons 
could themselves be fired by the guessing or action mechanisms given the presentation of the 
learned Li patterns and thus make the reduced dimensionality inputs available to L2 neurons 
for further learning. 
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5 Solution to the multi-neuron combinatorial switching prob- 
lem 



As suggested in Sec. 1, the organism- level learning problem (finding optimal R*{S), Fig. 1(a)) 
is solved using a trial-and-error search as variations are introduced in the firing combination 
R. As discussed in Sec 2.3, the signals driving the trial variations could come in the form 
of the Ca^"*" dendritic spikes originating in the apical tufts. The apical tufts of pyramidal 
neurons in cortical layers II/III and V are known to receive input from distal cortical areas 
and nonspecific thalamic projections, with the inputs generally having different origins than 
those that form synapses with more proximal apical or basal dendrites |28j . 

It is evident that a proper allocation of behaviors to various L2 neurons (or groups of 
neurons) can increase learning efficiency. For example, assume that L2 has n trainable binary- 
state neurons. Random search for an optimal combination R*{S) for a certain S, assuming 
for simplicity that only a single R*{S) exists, would consume an order of magnitude 2"' trials. 
This compares to only n trials if one neuron can be trained at a time in any order, or roughly 
"^"2'"''^^ trials if one neuron can be trained at a time, but in a particular order that also has to be 
found by trial and error. The latter training strategies would be possible if L2 neurons drove 
complementary motor behaviors, such as movements of legs and arms, or rough movements 
of a leg and finer movements of a leg. The optimal for learning layout of L2 and Li neurons 
should certainly be subject to major evolutionary pressures. It has so far been assumed that 
each neuron is learning independently, however, the excitation of patterns in L2 could be 
more coordinated. For example, Lj neurons could drive excitation of multiple L2 neurons. 

In mammals, it is evident that the " combinatorially trainable" layer L2 and Li neurons 
are likely primarily located in the neocortex where they can store complex behaviors. It is 
suggested that the hippocampus, situated at the edges of the neocortex and indirectly pro- 
jected into by it, is one of the major sites of generation of basic cognitive and higher emotions 
based on its observation of the neocortical and other brain activity, including the more "pri- 
mary" emotions generated in other brain regions. The emotions generated in the hippocampus 
should include the positive emotion, or internal feeling, associated with "understanding", or 
"realization", of something, expression of which modulates the memorization of the immedi- 
ately preceding behaviors (i.e., L2 and Li excitation patterns) in the neocortex (Fig. 8). The 
mechanism of generation of emotions, which probably involves an RPE-type signal [71 ISj [9l [10] 
in humans and higher animals, is not the main subject of this paper. However, several possible 
motives are discussed below. 

In one of its simpler forms, the emotion accompanying "understanding" could be elicited 
following the accomplishment, for the first time, of a behavior that leads to a positive emotion. 
For example, a toddler may accidentally (or through the built-in "guessing", or "creativity" 
mechanism) realize how to turn on a television. In agreement with the suggested model, the 
child then becomes very likely to turn on the television each time he/she sees it turned off. 
The most practical way of interrupting this behavior is to elicit negative emotions in the child 
via punishment. 

A more complicated form of "understanding" may be expressed when an organism, given a 
situation it is in, arrives at a more positive emotion than in a previous experience, or arrives at 
the same positive emotion as before, but in a simpler, faster, or, more generally, less negatively 
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Figure 8: The suggested process of learning in mammals. The diagram is an elaboration of 
the process described in Fig. 1(b). The hippocampus plays a role of an "observing body" 
that generates higher emotions. The "primary" emotions shown may also be partly generated 
in the hippocampus. The framework suggests that higher emotions may have structurally 
evolved as an extension of more elementary feelings such as pain or hunger. 

perceived way. For example, it may be a dog discovering that in a certain circumstance there 
is a simpler, or faster, way to obtain food than what had been usually achievable before. If the 
physical effort or the time spent waiting for food is perceived as an added negative emotion, 
all these scenarios can be considered part of a general scenario wherein an organism, given a 
situation, arrives at a more positive "aggregate" emotion than previously. 

From the prospective of hippocampus, the above dog's discovery may be describable as: 
given the pattern of neocortical and other synaptic inputs that represents the situation the 
dog is in, observe a more positive aggregate "primary" emotion associated with food than in 
the previous experience. Note that the hippocampus does not need to be aware of the trial 
excitation of various behaviors, but can only act as a spectator that compares the evolution 
of emotions that accompany the current situation to those recorded previously in similar situ- 
ations. Whenever a particular circumstance is followed by a more positive aggregate primary 
emotion than had been recorded previously, an added positive emotion such as "surprise" or 
"understanding" is generated, to memorize and use in the future any action the organism may 
have taken to cause that effect, as well as to possibly draw the attention to the action that 
enabled the effect. Conversely, negative emotions are generated if the circumstance does not 
result in a similar positive primary emotion within a comparable amount of time or effort 
spent. The hippocampus may employ the combinatorial switching principles discussed here to 
classify synaptic input patterns. However, the ability to evaluate the temporal relationships 
between inputs and gauge the magnitude of primary emotions would also be needed. 

The higher human emotion associated with finding a solution to a problem, e.g., a math- 
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More logically acceptable/less negatively emotional 
path found; positive emotion Is "automatically" 
generated by hippocampus and it initiates 
remembrance of the logical path taken 



Figure 9: The suggested process of finding solution to a problem, such as a mathematical 
problem. Circles correspond to concepts that are physically represented as neuronal excitation 
patterns. Arrows represent functional connections between the neuronal excitation patterns. 
The solving process consists of a (semi) subconscious search for simpler than previously known 
functional connections between various concepts. The randomized trials are driven by the 
"guess" and "action" neurons. If the sought for connection between A and B is found, the 
hippocampus generates a positive emotion that causes memorization of the logical path taken. 



ematical problem, may be generated rather similarly. It is suggested that humans think, 
including solve problems such as mathematical problems, by a subconscious or nearly sub- 
conscious trial- and-error search for functional connections, or simpler than previously known 
functional connections, between various concepts that are represented as neuronal excitation 
patterns. When the sought for logical connection or result is found (in other words, when the 
connection or result that had previously appeared impossible or difficult to achieve is found), 
the positive emotion is "automatically" generated by an observing body such as the hippocam- 
pus to solidify the logical path taken in memory, as well as possibly to draw the attention to 
the found solution (Fig. 9). Greater simplifications achieved, e.g., when a law of nature is 
discovered, likely lead to more positive emotions and better memorization, consistent with the 
Law of Effect [2j- Here the emotions generated in the hippocampus modulate the storage of 
logical transitions (thoughts) in the neocortex. 

In the presented framework, the main role of (conscious) emotions is to regulate the pro- 
cesses of learning and long-term memory; this is a much deeper role for emotions than is 
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currently accepted. Indeed, emotions seem to accompany all conscious learning experiences, 
with all new things learned accompanied by positive emotions and unsuccessful actions or fail- 
ures to understand an idea by negative emotions. An interesting corollary of the above is that 
the more neurons a biological system has and the more its learning abilities, the more emo- 
tions or feelings the system should be able to experience on a physiological level. On a more 
philosophical note, it can also be said that emotions, or feelings, are probably tightly related 
to consciousness. In fact, the only "real" or "tangible" objects from the subjective perspective 
may be feelings and emotions. Therefore, the conscious mind and self-perception probably 
exist not in the commonly familiar three-dimensional space, but in the learning-oriented space 
of feelings and emotions. 

6 Discussion 

Many actions of humans and higher animals seem to fit into the following paradigm: given a 
combination of sensory inputs, generate an appropriate for the combination action that can 
be altered through learning. It would be an elegant solution of nature if individual neurons, 
with some help of auxiliary neuronal circuitry, in fact exhibited this basic behavior — at the 
single-neuron level expressed as the combinatorial switching of the neuron's output. Indeed, 
pyramidal neuron connectivity suggests just that: barring necessity for system redundancy, 
why would a neuron's axon make multiple seemingly randomly distributed connections with 
another neuron's dendrites, unless there was a combinatorial aspect that is used? 

On the other hand, it is widely accepted that higher organisms try to learn to respond to 
the environment's inputs to achieve positive and avoid negative feelings and emotions [21 [T3l 
[T4t ITSl [16] : and that following these subjective learning goals is ultimately connected to the 
achievement of the organisms' survival and evolutionary objectives. 

The idea that emotions play a critical role in learning can be demonstrated with the fol- 
lowing example. Consider a toddler learning how to kick a ball to hit a real or imaginary 
target (creating an implicit, or procedural, memory) by repeatedly kicking the ball and ob- 
serving its trajectory. What is the mechanism that causes the motor activity associated with 
more successful trials to be memorized better than that associated with less successful trials, 
thus allowing the technique improvement? One could suggest that the child consciously and 
voluntarily, using some mental picture of the process, chooses to remember the movements 
associated with more successful trials. This would likely require a corresponding cognitive 
mechanism implemented at the neural level. However, this paper suggests that the positive 
emotions that accompany the child's realization that an attempt was successful already provide 
a convenient mechanism for relaying the signal of long-term memorization of the preceding 
spiking response to the neurons responsible for the more advantageous behavior. Indeed, the 
reason that emotional responses in humans and higher animals are delivered to a large num- 
ber (or all, via hormones) of trainable neurons [9] may be that the exact site of the neurons 
being trained, given the complexities of the sensory-motor signal flows, is not easily locatable 
from the perspective of the emotion generating systems, which themselves may be scattered 
throughout the nervous system. 

An interesting question is: why would the paradigm of combinatorial switching, in which 
the ability to classify input patterns into output patterns can be considered a multi-neuron 
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implementation, be successful in our world? The answer appears to be that, from a funda- 
mental perspective, the world around us is indeed usefully classifiable, which is in large part 
driven by the repeating motives in the terrestrial environment and the life organization into 
similarly behaving species as well as the similarities across the species. (On an even deeper 
level, these may be viewed as stemming from the invariance of the physical laws in space and 
time.) A wolf that has learned how to catch a rabbit is more likely to catch a second rabbit, as 
well as another alike animal, in a similar terrestrial environment. The key to efficient learning 
with a low-dimensional feedback signal (the emotional response) may be the ability to distil 
reusable concepts in relatively few learning trials. 

As an illustration of these ideas consider the following simple learning model. An untrained 
and hungry test subject has 12 sensory neurons connecting to 3 motor neurons. All the neurons 
operate in an "on" or "off' regime. The subject is seated at a table on which apples (rounded 
symmetrical shape, stem on top, smooth surface) or stones (rounded symmetrical shape, no 
stem on top, rough surface) are placed one at a time. The apples and stones can be of 1 
of 3 sizes (small, medium or large) and 1 of 3 colors (red, yellow or green). Each of the 3 
motor neurons drives an action: eating the object on the table, pushing it off the table, or 
doing nothing, in which case the object is removed from the table following a delay. Each of 
the sensory neurons fires if its assigned object feature is present: rounded shape, symmetrical 
shape, stem on top, no stem on top, smooth surface, rough surface, red, yellow or green color, 
small, medium or large size (the total of 12 features, one feature per sensory neuron). 

The sensory neurons connect to the motor neuron dendrites at random locations, forming 
N clusters of C excitatory synapses on each motor neuron. A cluster is defined as being 
excited if all C its synapses are excited. Each cluster is initially assigned a memory weight of 
0. A neuron fires in a "learned" excitation if at least M its clusters with memory weights of 
at least 1 are excited. A memory weight of 0.25 is added to a cluster for eating an apple, and 
0.1 for pushing an object off the table, if 1) all the cluster's synapses are excited, 2) this is 
immediately followed by a trial firing of the cluster's neuron and 3) this is immediately followed 
by a positive emotional response. A cluster's memory weight is reset to if 1) all the cluster's 
synapses are excited, 2) this is immediately followed by a trial or nontrial (learned) firing of 
the cluster's neuron and 3) this is immediately followed by a negative emotional response. 
Positive emotional response is generated for eating an apple or pushing an object off the table; 
negative emotional response is generated for eating a stone, doing nothing, or doing more than 
one action simultaneously (i.e., at least two motor neurons fire). After an object is placed 
on the table, the subject attempts to execute a memorized action. If there is no memorized 
action (i.e., less than M clusters with the memory weight of at least 1 are excited on each of 
the motor neurons) a random motor neuron fires in a trial firing. 

A computer program AECLS (Autonomous Emotional Combination Learning System) 
implemented the above learning algorithm. To complicate the problem for the subject and to 
test its deductive reasoning, no green or large apples and no small or red stones were presented 
during learning, while green large apples and small red stones were presented during testing. 
Specifically, the subject was presented with a random sequence of 8 objects: small red apple, 
small yellow apple, medium red apple, medium yellow apple, medium yellow stone, medium 
green stone, large yellow stone, large green stone. The objects were presented until the subject 
would have had a learned response, if tested, to each of the 4 test objects: large green apple. 
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large red apple, small red stone, medium yellow stone. At this point it was recorded whether 
the responses to the test objects would have been correct, i.e., eating apples and pushing off 
stones. In some cases the system was not able to learn responses to all 4 test objects even 
after a large number of trials. 

For C = A, N = 10,000 and M = 70, the subjects learned to pass all the tests correctly 
in 93.9% of cases (41.3 trial neuron firings on average prior to learning). In the other 6.1% 
of cases the subjects typically learned the response of pushing both apples and stones off the 
table. This usually occurred when "pushing off' was, at random, tried many more times than 
"eating" when an apple was presented; therefore, the subjects had learned to "push off' apples 
before having tried to "eat" many of them. To make the trial neuron firings more regular the 
algorithm was modified to select the firing neurons sequentially in a round-robin. Then, the 
subjects learned to pass the tests correctly in 98.9% of cases (39.8 trials on average before 
learning) . 

For C = 4, iV = 10,000 and M = 1 (with the round-robin motor neuron trials), the 
subjects learned the 4 correct responses in only 53.6% of cases. The most common reason 
for failing a test was due to motor neurons being excited by rarely occurring clusters that 
represented low-dimensional object features. For example, a cluster with 2 inputs coming 
from the "rounded shape" sensory neuron and 2 inputs from the "red" sensory neuron would 
cause all rounded red objects to be classified as edible if the training object sequence happened 
to have many red apples. Note that out of the 12^ = 20,736 clusters representing all possible 
ordered permutations of 4 out of 12 inputs, 1 cluster encodes the excitation of 1 chosen sensory 
neuron, 14 clusters encode the excitation of 2 chosen sensory neurons, 36 the excitation of 3 
chosen sensory neurons, and 24 the excitation of 4 chosen sensory neurons. Therefore, requiring 
a minimum number of excited clusters to fire a neuron assigned lower importance to one- and 
two-dimensional object features. 

Next, for each C from 1 to 4 (and the round-robin motor neuron trials) the optimal for 
learning M was searched for, using A'^ = 4 • 12*-^ so that all possible input combinations were 
likely to occur in the clusters. For C = 1, the test performance was best when M was equal to 
1, with the 4 correct test responses generated in only 56.1% of cases; for C = 2, >97% correct 
responses were generated for M from 29 to 35 (which represented 5-6.1% of all clusters); for 
C = 3, >97% correct responses were for M from 89 to 194 (1.3-2.8% of clusters); and for 
(7 = 4, >97% correct responses were for M from 286 to 897 (0.34-1.08% of clusters). Clearly, 
the systems with combinatorial memory (C > 1) performed much better than those without. 
It is interesting that the range of ^ when the test success rate was greater than 97% was 
rather similar for all C from 2 to 4. As expected, if A^ was significantly decreased the test 
performance deteriorated. For example, for C = A, N = 1,000 and M = 7 the correct responses 
to the 4 tests were learned in 87.1% of cases. 

Although the AECLS algorithm is simple, it does suggest that learning in the emotion- 
modulated combinatorial switching framework can be rather efficient, via deduction of reusable 
abstract concepts. In order to deduce the reusable abstract concepts the system needs to 
learn in situations that display both these concepts and variability in other features. The 
system deduces the reusable concepts by accumulating weights for the synapse clusters that 
represent the concept features. Note that the resulting behavior can be described as "dcdTictivc 
reasoning" and will probably appear intelligent to an external observer. It is evident that in 
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real neuronal systems the analogs of parameters C, M and should tend to evolve to suit a 
particular neuron's operating environment. 

In summary, it is suggested that pyramidal neurons can process information by switch- 
ing the neuron output based on active input neuron combinations. A trial-and-error learning 
paradigm is presented in which an (RPE-type) reward signal that itself may adjust over time 
modulates the combinatorial memory that stores learned behaviors. An experimental verifica- 
tion of the proposed mechanisms, including the putative mechanical or muscle- like contribu- 
tions that can provide computational advantages to the single-neuron combinatorial switching, 
is needed. 
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